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Abstract 

Background: HLA-DPs are class II MHC proteins mediating immune responses to many diseases. Peptides bind 
MHC class II proteins in the acidic environment within endosomes. Acidic pH markedly elevates association rate 
constants but dissociation rates are almost unchanged in the pH range 5.0 - 7.0. This pH-driven effect can be 
explained by the protonation/deprotonation states of Histidine, whose imidazole has a pK a of 6.0. At pH 5.0, 
imidazole ring is protonated, making Histidine positively charged and very hydrophilic, while at pH 7.0 imidazole 
is unprotonated, making Histidine less hydrophilic. We develop here a method to predict peptide binding to the 
four most frequent HLA-DP proteins: DPI , DP41, DP42 and DP5, using a molecular docking protocol. Dockings to 
virtual combinatorial peptide libraries were performed at pH 5.0 and pH 7.0. 

Results: The X-ray structure of the peptide - HLA-DP2 protein complex was used as a starting template to model 
by homology the structure of the four DP proteins. The resulting models were used to produce virtual 
combinatorial peptide libraries constructed using the single amino acid substitution (SAAS) principle. Peptides were 
docked into the DP binding site using AutoDock at pH 5.0 and pH 7.0. The resulting scores were normalized and 
used to generate Docking Score-based Quantitative Matrices (DS-QMs). The predictive ability of these QMs was 
tested using an external test set of 484 known DP binders. They were also compared to existing servers for DP 
binding prediction. The models derived at pH 5.0 predict better than those derived at pH 7.0 and showed 
significantly improved predictions for three of the four DP proteins, when compared to the existing servers. They 
are able to recognize 50% of the known binders in the top 5% of predicted peptides. 

Conclusions: The higher predictive ability of DS-QMs derived at pH 5.0 may be rationalised by the additional 
hydrogen bond formed between the backbone carbonyl oxygen belonging to the peptide position before p1 (p-1) 
and the protonated e-nitrogen of His 79p . Additionally, protonated His residues are well accepted at most of the 
peptide binding core positions which is in a good agreement with the overall negatively charged peptide binding 
site of most MHC proteins. 



Background MHC class II proteins are synthesized in the endoplasmic 
Major histocompatibility complex class II (MHC class II) reticulum (ER) and bind to a protein known as the MHC 
proteins are normally found in B lymphocytes, dendritic class II-associated invariant chain (Ii). Ii facilitates the ex- 
cells, and macrophages; they are primarily involved in pro- port from the ER of MHC class II proteins and prevents 
cessing foreign, extracellular antigens, which are endocy- binding to peptides resident in the ER. The complex 
tozed and then enclosed in endosomes containing acid MHC-Ii enters a specific endocytic compartment, called 
proteases. The pH in endosomes ranges from 4.5 to 6.0 [1]. MIIC (MHC class II compartment), which fuses with 
In endosomes, antigens are degraded into oligopeptides. endosomes. Ii is cleaved initially to the so-called CLIP frag- 
ment, with CLIP later being displaced by high-affinity pep- 
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Peptides binding to MHC class II vary in length from 
12 to 25 amino acids, yet the binding site accepts only 
nine peptide residues, the rest extending from both ends 
as the cleft is open-ended. The side chains of bound 
peptides project into several binding pockets while a 
system of hydrogen bonds forms between the peptide 
backbone and the side chain atoms of the MHC [2] . 

Human MHCs, known as HLA (Human Leukocyte 
Antigens), are extremely polymorphic and polygenic. 
The IMGT/HLA database lists over 1,600 HLA class II 
proteins [3]. HLAs class II contain three loci: DR, DQ 
and DP. DR and DQ proteins are well studied, while DP 
was initially considered of lesser importance in immune 
responses. However, it is now clear that HLA-DP pro- 
teins have important roles in mediating the immune 
response to many diseases, such as graft-versus-host 
(GVH) disease [4], sarcoidosis [5], juvenile chronic arth- 
ritis [6], Graves' disease [7], hard metal lung disease [8] 
and especially, chronic beryllium disease [9]. Recently, 
the X-ray structure of the HLA-DP2 (DPA*0103, 
DPBP0201) in complex with a self-peptide derived from 
the HLA-DR a-chain has been determined [10]. 
Although the overall structure of DP2 is similar to that 
of other MHC class II proteins, it contains a unique 
solvent-exposed acidic pocket containing three glutamic 
acids (Glu 26p , Glu 68p and Glu 69p ). This pocket may be 
able to bind Beryllium and present it to T cells, provid- 
ing a mechanistic explanation that underlies chronic 
Beryllium disease [10,11]. X-ray data also reveals that 
the DP2 binding site comprises four binding pockets: 
deep, hydrophobic pockets pi and p6; large, shallow, 
negatively charged p4; and deep, narrow and polar 
pocket p9. 

Peptides bind to MHC class II proteins in an acidic 
environment (pH~5.0). Bell-shaped profiles with op- 
tima at pH 5.0 are observed in many peptide - MHC 
class II binding experiments [12-14]. Acidic pH mark- 
edly elevated association rate constants 40 fold; dissoci- 
ation rates are, by contrast, almost unchanged in the 
pH range 5.0 - 7.0 [13]. The equilibrium binding level 
is thus enhanced at pH 5.0. The influence of pH on the 
binding equilibrium can be explained by subtle con- 
formational changes due to altered protonation and 
deprotonation states and near neighbor interactions. 
The only amino acid sensitive to pH in the range 5.0 - 
7.0 is histidine. The side-chain pK a of the His imidazole 
is 6.0. At pH 5.0 imidazole is protonated and His is 
thus positively charged and very hydrophilic. At pH 7.0, 
imidazole is unprotonated making His less hydrophilic. 
Thus, a pair of amino acids consisting of His and a 
hydrophobic residue could function as a pH-sensitive 
"His button" [14]. It "closes" at pH 7.0 (hydrophobic 
interaction) and "opens" at pH 5.0 (hydrophobic - charge 
repulsion). Such pH-sensitive switch was observed for 



His 33a in the formation of HLA-DR1 - HLA-DM com- 
plexes [14]. 

There are five His residues in the HLA-DP binding 
cleft: four belong to the a-chain (positions 5, 16, 44 and 
79) and one to the (3-chain (position 79). All five histi- 
dine residues are conserved among DP proteins. His 79|B 
side chain contacts the binding peptide in the vicinity 
of peptide position 2. Recently, a favorable tt-tt stacking 
between the aromatic rings of His 79|B and His 2peptlde was 
identified [15]. The other His residues are remote from 
the binding site and do not make contact with the 
bound peptide. 

Molecular docking is a key structure-based method 
with significant utility in drug design, bioinformatics, 
and immunoinformatics. In contrast to sequence-based 
approaches, virtual docking experiments do not require 
extensive pre-existing experimental data. The only infor- 
mation necessary is a reliable model of the peptide - 
MHC protein complex, as provided by X-ray crystallog- 
raphy. Docking methodology allows the development 
of predictive models where the training and test data 
are fully independent, thus, eliminating any possibility 
of over-fitting. We use rigid docking to identify opti- 
mised bound peptide conformations; since even for a 
nonamer, a fully unconstrained peptide docking would 
be of a prohibitively extended duration. However, since 
the number of distinct peptide conformations observed 
within currently-known X-ray structures remains very 
small, we make the parsimonious and wholly-reasonable 
assumption that peptides will bind in a similar con- 
formation. Molecular docking has been extensively and 
rigorously tested on both peptide-MHC class I and 
peptide-MHC class II complexes. As an approach to 
evaluating peptide binding to MHCs, it has proved to be 
rapid, accurate, and reliable [15-17]. 

Recently, we applied a molecular docking protocol to a 
library of modeled peptide-DP2 complexes to assess the 
contribution of each of the 20 naturally occurred amino 
acids at each of the nine binding core positions and four 
flanking residues (two at both ends) [15]. The normal- 
ized binding scores formed a quantitative matrix (QM), 
known also as a position-specific scoring matrix (PSSM). 
PSSMs are a commonly used representation of motifs 
or patterns within biological sequences. The predictive 
ability of the derived QM was assessed using an external 
test set of known binders to DP2. A comparison to pre- 
dictions made by existing servers for DP2 binding 
prediction indicated an improvement in performance 
offered by our docking score-based QM (DS-QM) [15]. 

In the present study, we modelled by homology four 
of the most frequent HLA-DP proteins [18]: DPI 
(DPA1*0201/DPB1*0101), DP41 (DPAP0103/ DPBP0401), 
DP42 (DPAP0103/DPBP0402) and DP5 (DPAP0201/ 
DPB 1*0501). We applied a similar docking protocol to 
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Table 1 Alignment of HLA-DPA1 (a chain) and HLA-DPB1 (P chain) for the five most frequent DP proteins 



AA Pos. 
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DPB 1*04:02 

DPB1*05:01 
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derive DS-QMs for peptide binding prediction [15]. To 
investigate the influence of pH on the predictive ability, 
different QMs were derived at two pH values: 5.0 and 
7.0. The QMs were validated using external test sets and 
compared to other servers for DP binding prediction. 
Additionally, in order to analyze the peptide-MHC pro- 
tein interaction interface, a single docking of HLA-DP2 
(DPA*0103, DPBP0201) in complex with a self-peptide 
derived from the HLA-DR a-chain (pdb code: 31qz) was 
analyzed using Rosetta Dock [19]. Our analysis affords a 



Table 2 The most sensitive models for HLA-DP peptide 
binding prediction at threshold of top 5% 

DP pH 7.0 pH 5.0 



model sensitivity AUC model sensitivity AUC 



DP1 


p2p7p8 0.426 


0.865 p3p7p8 0.455 


0.860 


DP41 


p1p2 0.490 


0.886 pi 0.497 


0.864 


DP42 


p1p3p4p5p6p7 0.471 


0.883 p1p2p3p8p9 0.504 


0.900 


DP5 


p5p6 0.514 


0.880 p1p2p5p7p9 0.523 


0.883 
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Figure 2 Sensitivities of the predictions calculated at threshold 
of top 5% predicted binders (a) and AUC values (b) by different 
servers for HLA-DP binding prediction. 



deep and detailed analysis of the different amino acid 
preferences at each position of peptides binding DP 
proteins. 



Methods 

Input data 

The X-ray structure of the HLA-DP2 (DPA*0103, 
DPBP0201) protein, in complex with a self-peptide 
derived from the HLA-DR a-chain, was used as the 
starting structure for homology modelling [10]. The 
covalently bound peptide was separated and defined as 
chain C. It consists of nine binding core positions 
(FHYLPFLPS) and six flanking residues (RK at the N 
terminus and TGGS at the C terminus). The conform- 
ation of the protein was used to model by homology the 
four HLA-DP proteins. The conformation of the bound 
peptide was used as a template for the modelling of four 
virtual combinatorial peptide libraries. 

Homology modelling 

Models of four HLA-DP proteins were built using the 
X-ray structure of HLA-DP2 protein (pdb code: 3lqz) as 
the template for homology modelling. HLA-DP proteins used 
were: DPI (DPA1*0201/DPB1*0101), DP41 (DPAP0103/ 
DPBP0401), DP42 (DPA1*0103/DPB1*0402), and DP5 
(DPAP0201/DPBP0501). The polymorphic amino acids 
among the first 80 amino acids from chain a (DPA1) and 
the first 90 amino acids from chain |3 (DPB1) were 
mutated accordingly. The resulting structure, in complex 
with the native peptide from the starting X-ray structure, 
was subjected to energy minimization by simulated 
annealing using the AMBER force field [20]. Each peptide- 
DP protein complex was used as a starting structure for 
generating the corresponding virtual peptide library. 
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Figure 3 Normalized FEB values for protonated and nonprotonated His residues at each of the nine peptide binding core positions. 

Protonated His is strongly preferred in most positions. 
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Figure 4 At pH 5.0 an additional hydrogen bond between the 
backbone carbonyl oxygen from the peptide position (-1) 
(here LYS-2) and the imidazole e-nitrogen of protein His 79p 
is formed. 

V. ) 



Combinatorial peptide library 

The nine positions forming the peptide binding core were 
examined. Four peptide libraries, each consisted of 172 
peptides (19 amino acids x 9 positions + 1 original ligand), 
were built using PyMOL [21]. The SAAS (single amino 
acid substitution) approach was used to model the con- 
formations of each altered side chains: after substitution, 
the peptide was minimized while keeping the MHC pro- 
tein rigid. The protonation state of ionisable protein side 
chains was assigned to a standard ionisable state: neutral 
for His; positively charged for Arg and Lys; and negatively 
charged for Asp and Glu [22]. In the case of docking at 
pH 5, His was considered to be positively charged. 



AutoDock protocol 

A parallelized version of AutoDock 4.2 [23], employing an 
implementation of the Lamarckian genetic algorithm 
(GA), was used to model the peptide binding to HLA- 
DPs. All simulations were run on the IBM Blue Gene - P 
of the Bulgarian Supercomputing Centre. The input 
ligands for AutoDock 4.2 were prepared by using 
tools developed in-house using C# and .NET. The 
output data were mined by python scripts using the 
MGL Tools 1.5.4 package [24]. All retained poses 
considered in the study had an RMSD below 2.0 A. 
To limit the computational burden of calculating pep- 
tide-MHC interactions at positions not involved in 
the static docking, all coordinates were kept fixed 
apart from the peptide residues of interest. These 
were left flexible. All GA settings were kept to their 
default values, apart from the number of energy eva- 
luations and the number of generations which were 
set to 2 500 000 and 27 000, respectively. The dock- 
ing grid was defined as a cuboid with respective 
dimensions of: 68 Ax 80 A x 80 A for DPI, 
72 A x 80 A x 82 A for DP41, 72 A x 80 A x 82 A for 
DP42 and 72 A x 80 A x 82 A for DP5 which encom- 
passed the entire peptide binding site on DP. The 
output from ten independent GA runs for each ligand 
was processed and the pose (binding conformation) 
with the lowest Free Energy of Binding (FEB) was 
considered. FEB values represent the direct output 
from the AutoDock 4.2 scoring function which takes 
into consideration weighted terms for van der Waals 
dispersion/repulsion, hydrogen bonding, electrostatics, 
desolvation interactions, and the change in torsional 
free energy when the ligand goes from an unbound 
to a bound state. 



Table 3 Pair energies in peptide binding pocket 1 



DP chain 


position 


aa 


peptide position 


aa 


Etotal 


E a tr 


Erep 


Esol 


Ehbnd 


Epair 


A 


9 


Tyr 




Phe 


0.02 


-0.4 


0 


0.41 


0 


0 


A 


24 


Phe 




Phe 


-0.46 


-0.45 


0 


-0.01 


0 


0 


A 


32 


Phe 




Phe 


-1.69 


-1.64 


0 


-0.05 


0 


0 


A 


43 


Trp 




Phe 


-0.01 


-0.13 


0 


0.12 


0 


0 


A 


52 


Phe 




Phe 


-0.39 


-0.42 


0 


0.04 


0 


0 


A 


53 


Ser 




Phe 


-0.88 


-0.50 


0 


0.33 


-0.71 


0 


A 


54 


Phe 




Phe 


-0.75 


-1.50 


0.41 


0.34 


0 


0 


A 


55 


Glu 




Phe 


0.01 


-0.01 


0 


0.02 


0 


0 


B 


80 


Asn 




Phe 


-0.12 


-1.52 


0.03 


1.38 


0 


0 


B 


83 


Leu 




Phe 


-0.11 


-0.15 


0 


0.04 


0 


0 


B 


84 


Gly 




Phe 


-0.17 


-0.21 


0 


0.04 


0 


0 


sum 










-4.55 


-6.93 


0.44 


2.66 


-0.71 


0 



Polymorphic residues are given in bold. E tot corresponds to the sum of all energies between the pair residues; E atr and E rep are the Lennard - Jones attractive and 
repulsive energies, respectively; E sol - the solvatation energy; E hbnd - energy of hydrogen bonding per residue; E pair - statistics-based pair term. 
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Figure 5 Peptide binding pocket 1. The a chain residues are 
shown in light blue, the (3 chain residues - in darksalmon. 
The dimorphic Asp/Gly 84p is shown in green. Ser 53a (given in 
magenta) makes a H-bond with peptide position 1 residue Phe 
(given as PHE-3 in orange). 



Docking score-based quantitative matrices (DS-QMs) 

The FEBs derived from the docking experiments had 
negative and positive values. Negative FEBs correspond 
to binding peptides, while positive FEBs correspond to 
non-binding peptides. Only negative FEBs were consid- 
ered; non-binding amino acids were assigned the penalty 
score of -10. The FEBs were normalized position per 
position using the following formula: 



FEBu 



FEBi - FEB 



FEB r 



FEB r 



Where FEBi is the binding energy of the i-th peptide, 
FEB is the average for a given position, FEB max and 
FEB min are the maximum and minimum FEBs for a 



given position. Normalized FEBs were multiplied by 
(-1) before being entered into the quantitative matrices 
(QMs) for ease of presentation. Thus, the positive 
FEBs correspond to preferred amino acids, and nega- 
tive FEBs to non-preferred residues. Eight QMs were 
derived: two for each HLA-DP protein at pH 5.0 and 
pH 7.0, respectively. 



Test set 

Four test sets of peptides known to bind HLA-DP 1, 
HLA-DP41, HLA-DP42, and HLA-DP5, respectively, 
were collected from the Immune Epitope Database [25] 
(June 2011 release). The test set of DPI binders con- 
tained 102 peptides originating from 60 proteins. The 
DP41 test set contained 152 binding peptides from 71 
proteins. The DP42 test set contained 122 binding pep- 
tides from 66 proteins. The DP5 test set contained 108 
binding peptides from 66 proteins. The peptides had dif- 
ferent lengths. No multiple binders were used. Each pro- 
tein was represented as a set of overlapping nonamers. 
The nonameric subsequence of any known binder with 
the highest score was considered a binder; all other pro- 
tein nonamers were considered as non-binders. The 
binding score of each nonamer was calculated as a sum 
of the weights of all nine positions or of different combi- 
nations thereof. 

The tests were performed under conditions similar to 
those which an experimental immunologist might use: 
proteins were cleaved into overlapping nonamers, the 
binding score of each nonamer was predicted. Nonamers 
were then ranked according to their binding score and 
the top 5% of the predicted nonamers was selected. The 
selected peptides were then compared to the known 
binders. If the nonamer sequence was part of the known 
binder sequence, the predicted peptide was considered 
as a true predicted binder. The ratio of all true predicted 
binders to all binders in the corresponding test set 
defined the sensitivity of prediction at the top 5% cut- 
off. The test sets used in the present study are given as 
Additional file 1. 



Table 4 Pair energies in peptide position 2 



DP chain 


position 


aa 


peptide 
position 


aa 


Etotal 


E a tr 


^rep 


Esol 


Ehbnd 


Epair 


A 


9 


Tyr 


2 


His 


-0.49 


-0.69 


0 


0.59 


-0.39 


0 


A 


54 


Phe 


2 


His 


-0.16 


-0.25 


0 


0.09 


0 


0 


B 


76 


Met 


2 


His 


-0.57 


-0.63 


0 


0.06 


0 


0 


B 


79 


His 


2 


His 


-1.45 


-2.45 


0.03 


1.13 


0 


-0.16 


B 


80 


Asn 


2 


His 


-1.94 


-1.62 


0 


1.73 


-1.95 


-0.10 


B 


83 


Leu 


2 


His 


-0.14 


-0.19 


0 


0.05 


0 


0 


sum 










-4.75 


-5.83 


0.03 


3.65 


-2.34 


-0.26 



Polymorphic residues are given in bold. 
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Figure 6 Peptide position 2. The a chain residues are shown in 
light blue, the (3 chain residues - in darksalmon. The dimorphic Met/ 
Val 76p is shown in green. Tyr 9a and Asn 8op (given in magenta) make 
H-bonds with p2 residue His (given as HIS-4 in orange). 

V ) 



Additionally, the models were compared in terms of 
the area under the receiver operating characteristics 
curve (AUC). Two variables - sensitivity and 1 -specificity 
- were calculated at different thresholds. AUC is a quan- 
titative measure of predictive ability and varies from 0.5 
for random prediction to 1.0 for a perfect prediction. 

Rosetta Dock protocol 

The Rosetta Dock server (http://rosettadock.graylab.jhu. 
edu) was used to generate the pair interaction energies 
across the peptide-DP2 protein binding interface. The 
X-ray structure of the peptide-HLA-DP2 (DPA*0103, 
DPBP0201) complex (pdb code: 3lqz) was used as input. 
The RosettaDock output file contains a table of pair en- 
ergies across the binding interface. Several energy terms 
are generated: E tot is the sum of all energies between the 
pair residues; E atr and E rep are the Lennard - Jones 



attractive and repulsive energies, respectively; E sot is the 
solvation energy according to the Lazaridis-Karplus solv- 
ation model [26], which penalizes buried polar groups; 
Ehbnd is the hydrogen bonding energy per residue; E pair is 
a statistically-based pair term derived from the PDB data- 
base, which favours salt bridges. 

Results 

Pair energies across peptide - HLA-DP2 protein binding 
interface 

The peptide-DP protein binding interface was analysed 
using the RosettaDock server [19]. 

It consists of 39 residues: 21 residues belong to a- 
chain (9, 11, 22, 24, 32, 43, 52, 53, 54, 55, 57, 58, 62, 63, 
65, 66, 68, 69, 70, 72 and 73) and 18 residues are from 
(3-chain (9, 11, 12, 13, 24, 26, 28, 45, 55, 59, 65, 69, 72, 
76, 79, 80, 83 and 84) [10] (Figure 1). Only five of the 
residues are polymorphic among the five most frequent 
DP proteins (Table 1). These are Tyr/Phe 9(B , Ala/Asp/ 
Glu 55p , Lys/Glu 69p , Val/Met 76p and Asp/Gly 84p . Asp 55p 
is involved in a salt-bridge with peptide Ser9, while 
the other polymorphic residues do not form either an 
H-bond or a salt bridge with the bound peptide. 

Docking score-based quantitative matrices (DS-QMs) 
for DPI, DP41, DP42 and DP5 

Four libraries, each consisting of 172 peptides (19 amino 
acids x 9 positions + 1 original ligand), were built. Each 
peptide was docked separately into the corresponding 
DP rigid binding site. DS-QMs were derived based on 
normalized FEB scores, as described in Data and Meth- 
ods. Dockings were performed at two pH values: 5.0 and 
7.0. Over this pH range, only His undergoes proton- 
ation/ deprotonation. At pH 5.0, His is protonated and 
very hydrophilic, yet at pH 7.0 His is neutral and less 
hydrophilic. The eight DS-QMs (four at pH 5.0 and four 
at pH 7.0) derived here are given in Additional file 2. 



Table 5 Pair energies in peptide position 3 


DP chain 


position 


aa 


peptide 
position 


aa 


Etotal 


E a tr 


^rep 


Esol 


Ehbnd 


Epair 


A 


9 


Tyr 


3 


Tyr 


-0.53 


-0.67 


0 


0.14 


0 


0 


A 


22 


Phe 


3 


Tyr 


-0.03 


-0.03 


0 


0 


0 


0 


A 


54 


Phe 


3 


Tyr 


-0.75 


-0.77 


0.03 


0 


0 


0 


A 


55 


Glu 


3 


Tyr 


-0.77 


-0.93 


0 


0.79 


-0.63 


0 


A 


57 


Gin 


3 


Tyr 


0 


-0.01 


0 


0.01 


0 


0 


A 


58 


Gly 


3 


Tyr 


-0.80 


-1.11 


0 


0.32 


0 


0 


A 


62 


Asn 


3 


Tyr 


0.01 


-0.18 


0 


0.18 


0 


0 


B 


76 


Met 


3 


Tyr 


-0.28 


-0.40 


0 


0.11 


0 


0 


B 


80 


Asn 


3 


Tyr 


0.01 


-0.07 


0 


0.08 


0 


0 


sum 










-3.14 


-4.17 


0.03 


1.63 


-0.63 


0 



Polymorphic residues are given in bold. 
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Figure 7 Peptide position 3. The a chain residues are shown in 
light blue, the (3 chain residues - in darksalmon. The dimorphic Met/ 
Val 76p is shown in green. Glu 55a (given in magenta) makes a H-bond 
with p3 residue Tyr (given as TYR-5 in orange). 



External validation 

A test set comprising 484 peptides known to bind HLA- 
DPI, HLA-DP41, HLA-DP42 or HLA-DP5, originating 
from 263 proteins, was used for external validation of 
the derived DS-QMs. Initially, the sensitivity of the top 
5% of the best scored peptides for each position was 
assessed using DS-QMs calculated at pH 7.0 and pH 5.0. 
Next, all possible combinations of different positions 
were evaluated. The most predictive models among all 



Table 6 Pair energies in peptide binding pocket 4 



DP chain 


position 


aa 


peptide 
position 


aa 


A 


9 


Tyr 


4 


Leu 


A 


62 


Asn 


4 


Leu 


B 


13 


Gin 


4 


Leu 


B 


24 


Phe 


4 


Leu 


B 


26 


Glu 


4 


Leu 


B 


72 


Val 


4 


Leu 


B 


76 


Met 


4 


Leu 


sum 











possible combinations between the nine peptide posi- 
tions are shown in Table 2. It is evident that almost all 
positions are involved in these highly predictive models, 
indicating that no peptide positions have a negligible 
effect on binding. The results also indicate that the 
models derived at pH 5.0 seem to predict better than 
those derived at pH 7.0. Moreover, different peptide posi- 
tions are important for binding at different pH values. 

Comparison to existing servers for HLA-DP binding 
prediction 

The best performing models derived here were com- 
pared to two state-of-the-art servers for MHC class II 
binding prediction: NetMHCII [27] and IEDB [28]. Both 
use sequence-based models powered by artificial neural 
networks (ANN). NetMHCII identifies nonamers, while 
IEDB works only with 15mers. The tests were performed 
as follows: protein sequences were converted into sets of 
overlapping peptides (9mers for NetMHCII and 15mers 
for IEDB), and the binding score of each peptide was 
predicted; peptides were ranked according to their bind- 
ing score, and the top 5% of the ranked peptides were 
selected and compared to known binders. If the pre- 
dicted peptide was included in the known binder 
sequence, it was considered a true predicted binder. The 
ratio of all true predicted binders to all binders in the 
corresponding test set defined the sensitivity of predic- 
tion at the top 5% cut-off. The sensitivities were 
recorded and compared to our best predicted models 
at pH 5.0 (Figure 2a). Additionally, servers were com- 
pared in terms of AUC (Figure 2b). It is evident that our 
DS-QM models out-performed state-of-the-art servers 
for DPI, DP42 and DP5 proteins. 

Effect of pH on peptide and protein His residues 

As peptides typically bind to class II MHC proteins in 
an acidic environment, with a pH between 4.5 and 5.5, 
dockings were performed at pH 5.0 and pH 7.0, and 
compared in terms of their predictive ability. Better pre- 
diction was found for the DS-QMs derived at pH 5.0. 



^total E a t r E re p E so | Ehbnd Ep a j r 

-0.02 -0.04 0 0.03 0 0 

-1.05 -0.72 0 0.81 -1.13 0 

0.18 -1.78 0.71 1.90 -0.66 0 

-0.37 -0.40 0.11 -0.08 0 0 

0.08 -0.10 0 0.19 0 0 

-0.76 -0.62 0.11 -0.25 0 0 

-0.86 -0.88 0.16 -0.14 0 0 

-2.80 -4.54 1.09 2.46 -1.79 0 



Polymorphic residues are given in bold. 
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Figure 8 Peptide binding pocket 4. The a chain residues are 
shown in light blue, the (3 chain residues - in darksalmon. 
The dimorphic Met/Val 76p is shown in green. Asn 62a and Gln 13p 
(given in magenta) make H-bonds with p4 residue Leu (given as 
LEU-6 in orange). 



The only amino acid sensitive to pH in the range 5.0 to 
7.0 is Histidine. The pK a of the His imidazole is 6.0, thus 
making His protonated and very hydrophilic at pH 5.0 
and unprotonated and less hydrophilic at pH 7.0. The 
influence of pH on the affinity of peptide binding to 
HLA-DP proteins has two potential aspects: influence 
on peptide protonation/deprotonation and influence on 
protein binding site protonation/deprotonation. Figure 3 
summarizes the normalized FEB values for protonated 
and nonprotonated His residues at each of the nine pep- 
tide binding core positions. It is clear that protonated 
His residues are preferred in most peptide positions 
(p3 to p9). As the peptide binding site on DP proteins is 



predominantly negatively charged [29], preference for 
positively charged His were expected. 

Five His residues are present in the HLA-DP binding 
site: four belong to the a-chain (positions 5, 16, 44 and 
79) and one belongs to the (3-chain (position 79). All are 
conserved among the studied DPs. Only His 7913 contacts 
the binding peptide in the vicinity of peptide position 2; 
the other His residues are distant from the binding site. 
The protonation of His 79|B allows an additional H-bond 
to be formed between the backbone carbonyl oxygen 
belonging to peptide position -1 (the position before pi) 
and the imidazole e-nitrogen of His 79 ' 3 (Figure 4). The 
estimated N-H. . .O = C bond energy for polypeptides in 
water environment lies within the range: 1.5-2 kcal/mol 
[30]. This means that the formation of this additional H- 
bond can increase the binding affinity constant of the 
peptide-protein complex by over an order of magnitude in 
the absence of other effects. This may explain the 
enhanced experimentally-observed equilibrium binding 
level seen at pH 5.0 [13]. 

Discussion 

In the present study, molecular docking procedures 
developed recently for peptide binding prediction to 
HLA-DP2 protein [15] were significantly extended to in- 
clude the four most frequent DP proteins [18]: DPI 
(DPAP0201/DPBP0101), DP41 (DPAP0103/ DPBP0401), 
DP42 (DPAP0103/DPBP0402) and DP5 (DPAP0201/ 
DPBP0501). The X-ray structure of the peptide - HLA- 
DP2 protein complex was used as a starting template to 
model by homology the structure of the four DP pro- 
teins. In turn, these were used to generate combinatorial 
peptide libraries built using the SAAS principle. Peptides 
were docked into the DP binding site using AutoDock 
at pH 5.0 and pH 7.0. The resulting scores were 
recorded, normalized, and used to generate DS-QMs. 
The predictive ability of these QMs was tested using an 
external test set and compared to existing servers for DP 
binding prediction. The models derived at pH 5.0 predict 
better than those derived at pH 7.0, showing significantly 
improved predictions for three of the four DP proteins, 
when compared to current state-of-the-art servers. DS- 
QMs can recognize 50% of the known binders in the top 
5% of predicted peptides. Moreover, a single docking 



Table 7 Pair energies in peptide position 5 


DP chain 


position 


aa 


peptide 
position 


aa 


Etotal 


E a tr 


^rep 


Esol 


Ehbnd 


Epair 


A 


62 


Asn 


5 


Pro 


0.38 


-0.86 


0.25 


0.99 


0 


0 


A 


65 


He 


5 


Pro 


-0.29 


-0.21 


0 


-0.08 


0 


0 


B 


13 


Gin 


5 


Pro 


-0.06 


-0.39 


0 


0.36 


-0.02 


0 


B 


26 


Glu 


5 


Pro 


0.04 


-0.04 


0 


0.08 


0 


0 



No polymorphic residues exist here. 
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Figure 9 Peptide position 5. The a chain residues are shown in 
light blue, the (3 chain residues - in darksalmon. No polymorphism 
exists here. Gin 1 3(3 (given in magenta) make H-bonds with p5 
residue Pro (given as PRO-7 in orange). 



of HLA-DP2 (DPA*0103, DPBP0201) in complex with a 
self-peptide derived from the HLA-DR a-chain (pdb code: 
3lqz) was analysed using RosettaDock. This characterised 



more fully the interacting amino acids across 
the peptide - MHC binding interface, helping iden- 
tify amino acid preferences at each position of the 
peptide binding core. 

Peptide binding pocket 1 (pi) consists of 11 resi- 
dues (Table 3). Ten of them are conserved and only 
Asp/Gly 84p is dimorphic (Figure 5). DPI and DP5 
contain Asp 84p , while DP41 and DP42 contain Gly 84p 
as does DP2. Aromatic amino acids such as Phe, Tyr, 
Trp and His, as well as aliphatic He and Leu are able 
to bind into this pocket. Additionally, the Asp 8413 - 
containing proteins DPI and DP5 accept positively 
charged Lys, Arg and His (when is charged at pH 
5.0). A hydrogen bond is formed between Ser 53a and 
NH of peptide position 1 (pi) (Table 1). 

The peptide position 2 (p2) makes contacts with 6 
residues of the binding site (Table 4), 5 of them are 
conserved, one (Met/Val 76|B ) is dimorphic (Figure 6). 
Only DPI contains Val 76 ' 3 , the remaining DPs have 
Met 7613 . The p2 side chain protrudes up the binding 
site close to the (3 chain and a variety of amino acids 
are well situated here. His at p2 makes H-bonds with 
Tyr 9a and Asn 80p , and salt bridges with His 79p and 
Asn 80p (Table 4). A tt-tt stacking of aromatic rings 
explains the preference of aromatic residues here 
[15]. Protonated His is not favored here. 

The side chain of peptide position 3 (p3) protrudes up 
of the binding site close to a chain. It contacts 7 a-chain 
residues and 2 (3-chain residues, one of which is the di- 
morphic Met/Val 76p (Table 5). Glu 55a makes a hydrogen 
bond with Tyr OH-group (Figure 7). The amino acid 
preferences here are quite uniform for the four DPs: Tyr, 



Table 8 Pair energies in peptide binding pocket 6 



DP chain 


position 


aa 


peptide 
position 


aa 


Etotal 


Eatr 


^rep 


Esol 


Ehbnd 


Epair 


A 


9 


Tyr 


6 


Phe 


-0.01 


-0.04 


0 


0.03 


0 


0 


A 


11 


Ala 


6 


Phe 


-0.31 


-0.36 


0 


0.05 


0 


0 


A 


22 


Phe 


6 


Phe 


-0.17 


-0.16 


0 


0 


0 


0 


A 


62 


Asn 


6 


Phe 


-1.09 


-2.44 


1.00 


1.74 


-1.39 


0 


A 


63 


He 


6 


Phe 


-0.01 


-0.01 


0 


0 


0 


0 


A 


65 


He 


6 


Phe 


-1.15 


-1.38 


0.02 


0.22 


0 


0 


A 


66 


Leu 


6 


Phe 


-0.76 


-1.33 


0.77 


-0.21 


0 


0 


A 


69 


Asn 


6 


Phe 


0.02 


-0.05 


0 


0.06 


0 


0 


B 


11 


Gly 


6 


Phe 


-0.66 


-0.82 


0 


0.17 


0 


0 


B 


12 


Arg 


6 


Phe 


-0.21 


-0.29 


0 


0.08 


0 


0 


B 


13 


Gin 


6 


Phe 


-0.31 


-1.56 


0 


1.25 


0 


0 


B 


26 


Glu 


6 


Phe 


-0.03 


-0.06 


0 


0.04 


0 


0 


B 


28 


Tyr 


6 


Phe 


-0.50 


-1.43 


0.35 


0.58 


0 


0 


sum 










-5.18 


-9.93 


2.14 


4.01 


-1.39 


0 



No polymorphic residues exist here. 
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Trp, Phe, Pro and the positively charged Arg. Met 76|B - 
containing DPs (DP41, DP42 and DP5) accept a proto- 
nated His here. 

Binding pocket 4 (p4) is large, shallow and negatively 
charged due to the presence there of Glu 26p , Glu 68p and 
Glu 6913 [10]. It strongly attracts positively charged amino 
acids such as Arg, Lys and protonated His. Leu, Tyr, Trp 
and Phe are also well accepted here. Asn 62a and Gln 13|B 
make H-bonds with Leu4 (Table 6 and Figure 8). Glu 6S|3 
and Glu 6913 are not shown to make contacts to p4, as 
Leu does not fill the pocket [10]. Surprisingly, Glu, Gin 
and Asn also fit well into this pocket making H-bonds 
with Asn 62a and Gln 13p . 

Position 5 (p5) protrudes from the binding cleft but it is 
still in close proximity to the negatively charged residues 
Glu 26p , Glu 68p and Glu 69p . This explains the observed 



Table 9 Pair energies in peptide position 7 



DP chain 


position 


aa 


peptide 
position 


aa 


A 


65 


lie 


7 


Leu 


A 


69 


Asn 


7 


Leu 


B 


26 


Glu 


7 


Leu 


B 


28 


Tyr 


7 


Leu 


B 


45 


Phe 


7 


Leu 


B 


59 


Trp 


7 


Leu 


B 


65 


He 


7 


Leu 


B 


69 


Glu 


7 


Leu 



sum 

Polymorphic residues are given in bold. 




Figure 11 Peptide position 7. The a chain residues are shown in 
light blue, the (3 chain residues - in darksalmon The dimorphic 
Glu/Lys 69p is shown in green. Asn 69a and Tyr 28p (given in magenta) 

make H-bonds with p7 residue Leu (given as LEU-9 in orange). 

k, J 



preferences for the positively charged Arg, Lys and proto- 
nated His and the disinclination for Asp and Glu. Pro at p5 
hydrogen bonds to Gln 13|B and contacts Asn 62a , Ile 65a and 
Glu 26|B (Table 7 and Figure 9). Phe and Trp are also well 
accepted at p5. No polymorphism exists here (Table 2). 

Binding pocket 6 (p6) is deep and formed by 8 resi- 
dues from the a-chain and 5 residues from the (3-chain 
(Table 8 and Figure 10). Asn 62a makes an H-bond with 
the NH of Phe6. No polymorphism exists here (Table 2) 
and that makes the amino acid preferences at this pocket 
uniform for the five DPs. Phe, Tyr, Trp and His (proto- 
nated and nonprotonated) are well accepted here. Lys 
and Arg also fit well. 

The side chain of position 7 (p7) lies tangentially to 
the binding site and is oriented towards the p-chain 
(Table 9). It is considered to be a secondary anchor 
position for some MHC class II proteins [31,32]. NH 
and CO of Leu7 make H-bonds with Tyr 28 13 and 
Asn 69a , respectively. The p7 side chain makes contacts 
with lie 65 ", GIu 26 P, Phe 45p , Trp 59 ^ Ile 65p and Glu/Lys 69 ? 



Etotal 


E a tr 


^rep 


Esol 


Ehbnd 


-0.24 


-0.34 


0 


0.10 


0 


-1.11 


-0.66 


0 


0.76 


-1.21 


0.15 


-0.47 


0 


0.62 


0 


-1.73 


-1.36 


0 


0.87 


-1.24 


-0.34 


-0.29 


0 


-0.05 


0 


-1.48 


-1.87 


0.13 


0.26 


0 


-0.33 


-0.24 


0 


-0.10 


0 


0.08 


-0.25 


0 


0.33 


0 


-5.00 


-5.48 


0.13 


2.79 


-2.45 
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Table 10 Pair energies in peptide position 8 



DP 
chain 


position 


aa 


peptide 
position 


aa E tota | 


^atr ^rep 


Esol 


Ehbnd 


Epair 


A 


65 


lie 


8 


Pro -0.14 


-0.13 0 


-0.01 


0 


0 


A 


69 


Asn 


8 


Pro 0.05 


-0.73 0 


0.78 


0 


0 


B 


55 


Asp 


8 


Pro 0.03 


-0.02 0 


0.05 


0 


0 


B 


59 


Trp 


8 


Pro -0.19 


-0.51 0 


0.32 


0 


0 


sum 








-0.25 


-1.39 0 


1.14 


0 


0 



Polymorphic residues are given in bold. 




Figure 12 Peptide position 8. The a chain residues are shown 
inlight blue, the (3 chain residues - in darksalmon. The trimorphic 
Ala/Asp/Glu 55p is shown in green. No H-bonds are made with p8 
residue Pro (given as PRO-10 in orange). 



(Figure 11). Aliphatic residues are well accepted here. 
Additionally, Asp is preferred by Lys 69|B -containing DP 
proteins. Position 69(3 is dimorphic: DPI, DP41, DP42 
and DP5 have Lys 69p , while DP2 has Glu 69p . Protonated 
His is accepted better here than the unprotonated form. 

Position 8 (p8) is solvent-exposed, yet shows prefer- 
ence for a variety of peptide residues: Trp, Tyr, Pro, Arg, 
Asn, Gly, Ala, His. Pro8 makes favourable contacts with 
Ile 65a and Trp 59p and disfavoured contacts with Asn 69a 
and Asp 55p (Table 10 and Figure 12). Position 55(3 is 

Table 1 1 Pair energies in peptide binding pocket 9 



polymorphic: DPI and DP41 contain Ala; DP2 and 
DP42 have Asp; and DP5 has Glu. However, this pos- 
ition is situated far from the side chain of p7 and does 
not influence the preferences there. Protonated His is 
preferred here. 

Binding pocket 9 (p9) is formed from Asn 68a , Asn 69a , 
Leu 70a , Thr 72a , Leu 73a , Phe/Tyr 9p , Ala/Asp/Glu 55p and 
Trp 5913 (Table 11 and Figure 13). It accepts large aro- 
matic, polar, and even charged residues [10]. The side 
chain of p9 is oriented towards the a-chain. Ser9 is too 
short to fill the pocket. It makes H-bonds with Asn 69a 
and Thr 72a . Phe, Tyr, Trp, His fit well into this pocket. 
The Asp/Glu 55|3 -containing DPs accept Arg and proto- 
nated His. 

The influence of pH on the affinity of peptides binding 
to HLA-DP has two main aspects: influence on peptide 
protonation/deprotonation and influence on protein bind- 
ing site protonation/deprotonation. At pH 5.0, His is posi- 
tively charged and it is preferred at peptide positions 3 to 
9. Among the five His residues in the HLA-DP binding 
site, only the protonation state of His 7913 affects peptide 
binding. At pH 5.0 an additional hydrogen bond is formed 
between the backbone carbonyl oxygen of the peptide pos- 
ition before pi (p-1) and the imidazole e-nitrogen of 
His 7913 (Figure 4). This H-bond increases the peptide bind- 
ing affinity by more than 3 orders of magnitude, perhaps 
explaining the higher experimentally-observed equilibrium 
binding level seen at pH 5.0 [30]. The peptide-protein as- 
sociation rate constants greatly increases at pH 5.0 (~ 40- 
fold), while the dissociation rates are almost unchanged in 
the pH range 5.0 - 7.0 [13]. Thus, one may speculate that 
the peptide-protein complex formed in the acidic environ- 
ment of endosomes will also be stable in the neutral envir- 
onment of the cell surface. 

Conclusion 

For peptide binding to the four most frequent HLA-DP 
proteins (DPI, DP41, DP42 and DP5), statistically the 
DS-QMs derived through molecular docking simulations 



DP chain 


position 


aa 


peptide 
position 


aa 


Etotal 


E a tr 


^rep 


Esol 


Ehbnd 


Epair 


A 


68 


Asn 


9 


Ser 


-0.02 


-0.05 


0 


0.07 


0 


-0.04 


A 


69 


Asn 


9 


Ser 


-0.9 


-1.73 


0.18 


1.99 


-1.16 


-0.18 


A 


70 


Leu 


9 


Ser 


0 


-0.01 


0 


0.01 


0 


0 


A 


72 


Thr 


9 


Ser 


0.59 


-0.68 


0.98 


1.13 


-0.78 


-0.07 


A 


73 


Leu 


9 


Ser 


-0.15 


-0.23 


0 


0.08 


0 


0 


B 


9 


Phe 


9 


Ser 


0 


0 


0 


0 


0 


0 


B 


55 


Asp 


9 


Ser 


-0.03 


-0.40 


0 


0.46 


0 


-0.09 


B 


59 


Trp 


9 


Ser 


-0.03 


-0.04 


0 


0.01 


0 


0 


sum 










-0.54 


-3.14 


1.16 


3.75 


-1.94 


-0.38 



Polymorphic residues are given in bold. 
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at pH 5.0 gave better predictions than those derived 
at pH 7.0 and performed better than current state-of- 
the-art servers for MHC binding prediction. Clear differ- 
ences are observed in our X-ray-based protein-peptide 
models: an additional hydrogen bond is formed between 
the backbone carbonyl oxygen belonging to the peptide 
position before pi and the protonated e-nitrogen of 
His 79 1 This additional hydrogen bond may provide 
additional stabilization for all peptide regardless of 
their sequences, provided that they have a sufficiently 
long N-terminal extension. Protonated His residues 
make favourable interactions at most of the peptide 
binding core positions. 

Additional files 
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